cd /usr/lib/MineSet/demos/movies ; ./playmovies -l
We've provided the scripts of all the movies, in case your machine doesn't have audio enabled.
Decision Tree | Movie [9.59 MB] | Script | |||||
Regression Tree | Movie [7.27 MB] | Script | |||||
Decision Table | Movie [4.96 MB] | Script | |||||
Eviviz | Movie [5.17 MB] | Script | |||||
Splatviz | Movie [27.02 MB] | Script | |||||
Mapviz | Movie [6.69 MB] | Script | |||||
Scatterviz | Movie [9.90 MB] | Script |
You are viewing a decision tree that was automatically built by MineSet to explain when working adults in the US earn more than $50,000.
The two bars on the root node represent different classes:
The decision tree tells us that the most important factor for distinguishing the two classes is age. It also determines the threshold age of 27 as crucial. People under 27 are represented by the left subtree, while people over 27 are represented by the right subtree.
In the left child, representing people under 27, about 97% earn less than $50,000. Going to the right, the decision tree shows us that education is an important factor. The census bureau assigns numbers for different levels of education. 13 represents a bachelor's degree and MineSet determined that this was a crucial threshold. One can see that the distribution in the two children is very different. People over 27 without a bachelor's degree are represented in the left subtree, which shows 21% earning over $50,000. The right subtree shows that the bachelor's degree increases that percentage to 55.
Let's go back to the root and fly down slowly.
One powerful capability of the Tree Visualizer is to expand subtrees as you get closer to them. Note how the nodes move apart making room for more information as we get closer to them.
You are now looking at MineSet's tree visualizer. You are viewing a regression tree that was automatically built by MineSet to predict the gross income of a working adults in the US.
The nine bars on the root node form a histogram that represents the gross income distribution of all adults. The leftmost bar indicates the number of people with the lowest income, the rightmost bar indicates the number of people with the highest income.
By pointing to the bars, we can see text at the upper left that shows the exact proportions: 23% of the sample fall below the 11 percentile of gross income, while about 7% of the sample fall above the 88 percentile.
The regression tree tells us that the most important factor for predicting income is age. It also determines the threshold age of 27 as crucial. People under 27 are represented by the left subtree. The mean income for this group is $14000.
While people over 27 are represented by the right subtree. The mean income for the this group is $40000.
The regression tree shows us that education is an important factor in predicting income for people over 27. The census bureau assigns numbers for different levels of education. 13 represents a bachelor's degree and MineSet determined that this was a crucial threshold.
One can see that the distribution in the two children is very different. People over 27 without a bachelor's degree are represented in the left subtree, which shows a mean gross income of $33000. The right subtree shows that the bachelor's degree increases the mean income to $57000.
You are now looking at a decision table that was automatically built by MineSet to explain which mushrooms are edible and which are poisonous.
The pie in the right pane shows the initial proportions. 52% are edible and 48% are poisonous.
The cakes in the left pane show the distribution broken down by pairs of attributes.
The top level shows the two attributes, odor and spore-print-color, that provide the most information about whether a mushroom is edible or poisonous.
Each cake has a height indicating the number of records. [move slider/ or tilt on edge] The face of each cake shows the proportions of mushrooms of each class.
For example, this cake [zoom in and select] represents mushrooms whose odor is none and spore-print-color is white. Only 7.6% of these mushrooms are poisonous.
By drilling down it is possible to see what other factors help predict whether a mushroom is edible or poisonous. According to the decision table the next most important factors are habitat and population.
For odor-less mushrooms with white spore prints only two subclasses are poisonous, those which occur as clusters in leaves and those which occur as several isolated mushrooms in the woods.
The pie in the right pane shows the initial probability in the data. By pointing to one of the classes we can see the text on the top left indicates that 16.9% earn over $60,000.
The cakes in the left pane shows how each attribute value affects the probability. If the slices are approximately equal, as we can see for the second value of the bottom attribute representing race=white, it indicates that the probability will not change if you know that value for a record.
However, if the slices are not equal, it indicates that factor will influence the posterior distribution.
For example, the first cake for age, representing people under 19, gives high evidence that person will make under $10,000. As people age their income increases with a slight dropoff after age 61.
We can see that the cakes have heights. The height of a cake is proportional to the number of records. By looking at the sex attribute, we can see that there are more males than females in this sample.
By clicking on the button near a class name, the display will change to a bar display to emphasize which values give evidence that increases the probability of a record belonging to that class.
You are now looking at MineSet's splat visualizer which allows you to view large amounts of multi-dimensional data. The data shown represents approximately 50,000 working adults in the US. The three spatial axes show: level of education, hours worked per week, and occupation. In addition age has been mapped to an animation slider.
Rather than displaying a single point for each record the splat visualizer aggregates records and displays a splat representing all the records in that region. The color of the splat indicates the average gross income. The opacity indicates the number of record.
By looking at splat densities we can see that most people between 30 and 40 have at least some college education.
We can also see that the occupations professional specialty and executive manager have high densities of people who are highly educated, work long hours, and have high incomes.
By increasing the intensity slider we can see regions where there is very little data. Our confidence in the color shown for these regions is low because the average incomes here are based on fewer records.
By moving the animation slider it is possible to see how age affects the other attributes.
It is also possible to select other splat shapes.
The map visualizer allows you view data containing geographical relationships. The data shown represents births in the Netherlands over the different regions. The height of each region represents the number of births. The color, specified at the bottom of the window, indicates the population density of the region. Moving the mouse over a region displays the information for that region.
The two independent variables: age and year allow you to slice through the data. For example, by moving the age slider the map will change. By plotting a path you can animate over the different age groups or years or a combination of both.
As expected, the number of births increases as women age, then it decreases as they get passed child-bearing age.
The scatter visualizer allows you to look at multi-dimensional data. The data shown represents births in the Netherlands. The axes are the region name, the region population density, and the number of births per 1,000 women. Moving the mouse over a cube displays the information for that cube.
The color of each cube represents the region's total population. There are two sliders on the right side, that allow users to slice through the data at different values of the independent variables. In our example, these are the women's age and the year the survey was done.
The data shows women at age 20. To switch to 25, we click on a higher point. We can also animate how the data change over time by plotting a path in the panel. By using the VCR buttons, we can see that the number of births increases as women get older then decreases as women get past child-bearing age.
The scatter visualizer allows you to see 8 dimensions: the three axes, the shapes of entities, their color, their size, and two independent dimensions for animation.